Introduction

The objective of this project is to construct historical bike lane data in New York City (NYC). In terms of data structure, the goal is to build a yearly panel of bike lanes, with key variables being bike lane class (I protected, II standard, III shared, L link) and bike lane geometry coordinates.

With these data, we would be able to easily compare the state of the bike lane network across time, and use it in different analyses, studying for example the impact of the bike lane network on bike accidents, or the role of the bike lane network on the usage of bike-share. Precise data on the lanes’ class are crucial, as previous research has shown that protected bike lane have a substantial effect on rider experience, and are therefore the ones with the most potential for change. Furthermore, accurate location and timely reporting help us design more precise and unbiased statistical strategies, which in turn enable more exact estimation of policy effects.

Data sources and differences

NYC has two main sources of bike lane data:

  • NYC Department of Transportation’s (DOT) Bicycle Routes
  • NYC Department of City Planning’s (DCP) LION (I will use “LION” and “DCP” interchangeably to refer to LION-based data).

Key differences in variable’s availability between both:

  • DOT reports install date and modified date ; LION doesn’t keep bike lane history.
  • DOT reports comments, which may help the interpretation of the modified date variable; no bike lane comments in LION.
  • DOT reports a single bike lane class (see above) in addition to from-to and to-from (both direction) facility (more disaggregated than class — each facility seems to belong to a single class, but a few inconsistencies found in the DOT).
    • The bike lane class is the class of the best facility on the segment: I call that “the best criteria”;
  • LION reports a bike lane class variable (BikeLane), which may be non-unique, i.e. a segment may be coded I or II but also I, III or II,I.

Data transformations

The goal is to build a panel data set, each row representing a bike lane segment for a given year. The two data sets require different transformations to make them fit that structure. I briefly describe the main steps taken in both to achieve that data structure and infer some key variables. (Detailed R scripts are available upon request).

DOT

  1. Download the 2020 edition of the Bicycle Routes (representing the state of bike lanes at the end of 2019);
  2. Identify the bike lanes with up-/downgrading potential by filtering install_date < modified_date ;
  3. For those, infer a new variable previous class of bike lane following the criteria described below;
    1. if the comment explicitly mentions a previous facility of class, input that class to previous class,
    2. if 1. does not apply but the comment explicitly mentions an upgrade (downgrade), input the class directly below (above) the current bike lane class to previous class,
    3. if 1. or 2. do not apply, input the current class to previous class.
  4. Expand the data set for the years of interest and code, for each year (i.e., year), current bike class as a function of the install and modified dates.

Note: the up-/downgrade criteria is very conservative, but without more information from the DOT on what the modify date represents in general, many assumptions need to be made. Another, slightly less conservative approach would be to tag all the bike lanes for which we are certain, looking at comments, were not up-/downgraded, and then applying step 3.2 to all un-tagged bike lanes, therefore assuming that, by default, bike lanes where upgraded from the class right below.

DCP/LION

  1. Download a given year of LION (start year);
  2. Keep only segments that have a non-missing BikeLane field;
  3. Retain core variables: BikeLane, SHAPE_Length, SHAPE;
  4. Drop duplicates;
  5. Create a new variable bike lane class that follows the “best” criteria outlined above and used in the DOT;
  6. Loop for all desired years.

How do DOT and DCP compare

This section compares both data sources and provides statistical and visual illustrations of differences.

Summary statistics

Compute summary statistics by year and bike lane class. Lengths are in US foot.

Tables

DOT

DOT · by year and bike lane class
year bike_lane_class length count length_tot count_tot length_perc count_perc growth
2009.00 I 1094615.00 3577.00 2597250.00 10025.00 0.42 0.36
2009.00 II 1133772.00 4642.00 2597250.00 10025.00 0.44 0.46
2009.00 III 368863.00 1806.00 2597250.00 10025.00 0.14 0.18
2010.00 I 1132368.00 3767.00 2744672.00 10666.00 0.41 0.35 0.03
2010.00 II 1204434.00 4948.00 2744672.00 10666.00 0.44 0.46 0.06
2010.00 III 407870.00 1951.00 2744672.00 10666.00 0.15 0.18 0.11
2011.00 I 1148682.00 3828.00 2802669.00 11022.00 0.41 0.35 0.01
2011.00 II 1223284.00 5086.00 2802669.00 11022.00 0.44 0.46 0.02
2011.00 III 430703.00 2108.00 2802669.00 11022.00 0.15 0.19 0.06
2012.00 I 1184373.00 3975.00 2944070.00 11576.00 0.40 0.34 0.03
2012.00 II 1262645.00 5209.00 2944070.00 11576.00 0.43 0.45 0.03
2012.00 III 497052.00 2392.00 2944070.00 11576.00 0.17 0.21 0.15
2013.00 I 1266710.00 4353.00 3223231.00 12711.00 0.39 0.34 0.07
2013.00 II 1359692.00 5592.00 3223231.00 12711.00 0.42 0.44 0.08
2013.00 III 596829.00 2766.00 3223231.00 12711.00 0.19 0.22 0.20
2014.00 I 1286789.00 4402.00 3383226.00 13384.00 0.38 0.33 0.02
2014.00 II 1417885.00 5844.00 3383226.00 13384.00 0.42 0.44 0.04
2014.00 III 678552.00 3138.00 3383226.00 13384.00 0.20 0.23 0.14
2015.00 I 1347605.00 4661.00 3591424.00 14337.00 0.38 0.33 0.05
2015.00 II 1486442.00 6158.00 3591424.00 14337.00 0.41 0.43 0.05
2015.00 III 757377.00 3518.00 3591424.00 14337.00 0.21 0.25 0.12
2016.00 I 1395414.00 4932.00 3848818.00 15421.00 0.36 0.32 0.04
2016.00 II 1647973.00 6754.00 3848818.00 15421.00 0.43 0.44 0.11
2016.00 III 805431.00 3735.00 3848818.00 15421.00 0.21 0.24 0.06
2017.00 I 1457435.00 5157.00 4083474.00 16405.00 0.36 0.31 0.04
2017.00 II 1781638.00 7317.00 4083474.00 16405.00 0.44 0.45 0.08
2017.00 III 844401.00 3931.00 4083474.00 16405.00 0.21 0.24 0.05
2018.00 I 1499542.00 5320.00 4280138.00 17138.00 0.35 0.31 0.03
2018.00 II 1905290.00 7768.00 4280138.00 17138.00 0.45 0.45 0.07
2018.00 III 875306.00 4050.00 4280138.00 17138.00 0.20 0.24 0.04
2019.00 I 1549166.00 5575.00 4520149.00 18259.00 0.34 0.31 0.03
2019.00 II 2044760.00 8399.00 4520149.00 18259.00 0.45 0.46 0.07
2019.00 III 926223.00 4285.00 4520149.00 18259.00 0.20 0.23 0.06

DCP/LION

DCP · by year and bike lane class
year bike_lane_class length count length_tot count_tot length_perc count_perc growth
2011.00 I 873007.00 2522.00 2651973.00 9780.00 0.33 0.26
2011.00 II 1359016.00 5344.00 2651973.00 9780.00 0.51 0.55
2011.00 III 419950.00 1914.00 2651973.00 9780.00 0.16 0.20
2012.00 I 879279.00 2545.00 2657657.00 9826.00 0.33 0.26 0.01
2012.00 II 1359382.00 5359.00 2657657.00 9826.00 0.51 0.55 0.00
2012.00 III 418996.00 1922.00 2657657.00 9826.00 0.16 0.20 0.00
2013.00 I 879688.00 2579.00 2658856.00 9884.00 0.33 0.26 0.00
2013.00 II 1359815.00 5380.00 2658856.00 9884.00 0.51 0.54 0.00
2013.00 III 419353.00 1925.00 2658856.00 9884.00 0.16 0.19 0.00
2014.00 I 879224.00 2582.00 2662017.00 9964.00 0.33 0.26 0.00
2014.00 II 1362960.00 5431.00 2662017.00 9964.00 0.51 0.55 0.00
2014.00 III 419833.00 1951.00 2662017.00 9964.00 0.16 0.20 0.00
2015.00 I 883317.00 2623.00 2676309.00 10085.00 0.33 0.26 0.00
2015.00 II 1370935.00 5473.00 2676309.00 10085.00 0.51 0.54 0.01
2015.00 III 422057.00 1989.00 2676309.00 10085.00 0.16 0.20 0.01
2016.00 I 1135083.00 3720.00 3567123.00 13874.00 0.32 0.27 0.29
2016.00 II 1572309.00 6303.00 3567123.00 13874.00 0.44 0.45 0.15
2016.00 III 859731.00 3851.00 3567123.00 13874.00 0.24 0.28 1.04
2017.00 I 1285822.00 4392.00 3916221.00 15487.00 0.33 0.28 0.13
2017.00 II 1734775.00 6989.00 3916221.00 15487.00 0.44 0.45 0.10
2017.00 III 895624.00 4106.00 3916221.00 15487.00 0.23 0.27 0.04
2018.00 I 1344400.00 4699.00 4055434.00 16287.00 0.33 0.29 0.05
2018.00 II 1823704.00 7485.00 4055434.00 16287.00 0.45 0.46 0.05
2018.00 III 887330.00 4103.00 4055434.00 16287.00 0.22 0.25 -0.01
2019.00 I 1345000.00 4775.00 4054817.00 16371.00 0.33 0.29 0.00
2019.00 II 1822485.00 7486.00 4054817.00 16371.00 0.45 0.46 0.00
2019.00 III 887332.00 4110.00 4054817.00 16371.00 0.22 0.25 0.00
2020.00 I 1544227.00 5688.00 4534232.00 18518.00 0.34 0.31 0.15
2020.00 II 2061931.00 8514.00 4534232.00 18518.00 0.45 0.46 0.13
2020.00 III 928074.00 4316.00 4534232.00 18518.00 0.20 0.23 0.05

Graphs

DOT

Class shares

Class absolute lengths

Class growth rates

DCP/LION

Class shares

Class absolute lengths

Class growth rates

Maps

Here I map three distinct years (2011, 2015, 2019) of bike lanes with each dataset and highlight differences in spatial extent and categorization.

2011

DOT

DCP

Differences

Spatial extent

Visually, it seems that the DOT systematically has larger spatial coverage. I take the difference between the DOT lines and the DCP ones, map it in red and the DCP in blue, highlighting what the DCP is “missing”.

Categorization

Not done: involves matching both DOT and DCP layers, but they do not spatially overlap which complicates the matter. Some work could be done with LION IDs, however.

2015

DOT

DCP

Difference

Spatial extent

2019

DOT

DCP

Difference

Spatial extent

Questions

DOT

  • What is the relationship between bicycle facility types (“Greenway”, “Standard”, “Curbside”, “Sharrows”, etc) and bicycle lane classes? There are a few inconsistencies in the dataset, but it seems that facilities always fall in a single class.
  • Does modified_date relate to changes in facility types or classes?
  • Are up-/downgrades always noted in the comment section?
    • How much may we trust the comment section?
    • How well do the comments document the full history of a segment?
  • If there are no comments but a modification date, what can we assume?
    • … that it upgrade from a lower class?
    • … that it was simply worked on and stayed in the same class?
  • Why do instdate and moddate disappear in the 2021 export of the dataset?
    • did the DOT “lose trust” in those variables?

DCP/LION

  • What is the pipeline behind the creation of LION and the BikeLane variable?
    • How did the pipeline change over time? What consequences does that have for the BikeLane data?
    • How much may we trust LION to be representative? Which variables are the most trustworthy and which aren’t?
  • The total length of bike lanes up to 2015 is very constant…
    • seems that the DOT data were not fed into the LION (pun intended)?
    • why a huge spike in 2016?
  • How stable are SegmentIDs over time?
    • what is LegacyID?
    • how often do the segment IDs change? yearly? quarterly?
  • Is there a map of ZIP codes?
  • 2020d means that it was the fourth quarter of 2020: is it representative of the state of LION at the end of the quarter, i.e. 2020-12-31?